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SPANNING TREE SIZE IN RANDOM BINARY SEARCH TREES 

By Alois Panholzer and Helmut Prodinger 

Technische Universitdt Wien and University of the Witwatersrand 

This paper deals with the size of the spanning tree of p randomly 
chosen nodes in a binary search tree. It is shown via generating func- 
tions methods, that for fixed p, the (normalized) spanning tree size 
converges in law to the Normal distribution. The special case p = 2 
reproves the recent result (obtained by the contraction method by 
Mahmoud and Neininger [Ann. Appl. Probab. 13 (2003) 253-276]), 
that the distribution of distances in random binary search trees has 
a Gaussian limit law. In the proof we use the fact that the span- 
ning tree size is closely related to the number of passes in Multiple 
Quickselect. This parameter, in particular, its first two moments, was 
studied earlier by Panholzer and Prodinger [Random Structures Al- 
gorithms 13 (1998) 189-209]. Here we show also that this normalized 
parameter has for fixed p-order statistics a Gaussian limit law. For 
p — 1 this gives the well-known result that the depth of a randomly 
selected node in a random binary search tree converges in law to the 
Normal distribution. 

1. Introduction. In the papers [7] and [1] the distances between nodes 
in random search trees, respectively, random recursive trees were studied. 
It was proven in [7] that the (edge) distances A n between two randomly 
selected nodes in random binary search trees of size n are asymptotically 
normally (Gaussian) distributed, where the so-called random permutation 
model was used as the model of randomness for the trees. This means that 
every permutation of length n is assumed to be equally likely when gener- 
ating a binary search tree; furthermore, for selecting nodes, all Q) pairs of 
nodes are assumed to be equally likely. 

In [1] it was shown that the distribution of the distance A(i,n) between 
a fixed node i and the node n in a random recursive tree of size n is (for 
a fixed ratio p := ~ with < p < 1) asymptotically normally distributed. 
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A related parameter to the distance between two randomly selected nodes 
is the Wiener index of a graph, which is defined to be the sum of all dis- 
tances between pairs of nodes in the considered graph. The Wiener index 
was studied for certain families of graphs and, although the scaled mean 
of this parameter must coincide with the mean distance of two randomly 
selected nodes, it turned out that the Wiener index was asymptotically not 
normally distributed for random recursive trees and random binary search 
trees (see [8] and [5]). 

In this paper we concentrate on random binary search trees and study 
a natural extension of the distance between two randomly selected nodes, 
namely the size of the spanning tree of p randomly chosen nodes in the 
tree. Again, we use the random permutation model for the generation of the 
binary search trees and also that all (™) possibilities to select p nodes in a 
tree of size n are equally likely. The selection of the p nodes will thus be 
independent of the chosen tree. The random variable Y HtP , which counts the 
size of the spanning tree of p randomly selected nodes in a random binary 
search tree of size n, is then a direct extension of A n , since the edge distance 
between two nodes is nothing else than the size of the spanning tree of these 

two nodes minus one and thus A n = Y n ^ — 1, where = denotes equality in 
distribution. 

In the mathematical analysis of Y np we use the fact that it is closely 
related to the random variable X njP , which counts the number of passes 
required in the Multiple Quickselect algorithm to find a random p-order 
statistic in a data file of length n (see [9] and the references cited therein 
for a description of this divide and conquer algorithm); the natural prob- 
ability model for the data is, that their ranks form a random permutation 
of {l,...,n} and we assume further that all (™) sets of p-order statistics 
{1 < i± < • • • < i p < n} are equally likely. Then by well-known relations be- 
tween binary search trees and Quicksort-like algorithms, X n ^ p is equal to the 
number of ascendants of p randomly chosen nodes in a random binary search 
tree of size n or, equivalently, to the size of the spanning tree, spanned by 




Spanning tree of the nodes 7, 9 and 10 5 passes of the Multiple Quickselect algorithm 
is of size 4 to find the ranks 7, 9 and 10 

Fig. 1. A binary search tree with the two parameters under consideration. 
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the root and p randomly chosen nodes (where of course the root could have 
been chosen as well) in a random binary search tree of size n. See Figure 1 
for a comparison of both parameters. 

The parameter X np was studied already in [9], where exact formulas for 
the expectation and the variance were given. Here we show additionally 
that X nt p is, for fixed p > 1, asymptotically normally distributed (Philippe 
Flajolet mentioned that to Helmut Prodinger in 1998 without working out 
the details). 

For Y n ,p w e also derive exact formulas for the expectation and the variance 
and show that Y n , p is, for fixed p>2, asymptotically normally distributed, 
where the special case p = 2 reproves that the distances A n between two 
randomly selected nodes in random binary search trees of size n are asymp- 
totically normally distributed. Our approach uses generating functions, sin- 
gularity analysis and a central limit theorem for combinatorial structures 
due to Hwang and avoids the difficulties which occur in [7] when showing 
the asymptotic normality for A n using the contraction method which arises 
due to the degenerate nature of the distributional limit equation for X n \ 
(that was studied there to obtain the result for A n ). 

2. Passes in Multiple Quickselect and spanning tree size in binary search 
trees. First we want to translate the close relation between X ntP and Y n<p 
into an equation for suitable generating functions as described below. 

Here we denote with (p n)P) m := P{X n ^ p = m} the probability that ex- 
actly m passes of the Multiple Quickselect algorithm are required in or- 
der to find a random set of p-order statistics in a data file of length n 
and with F n ^ p ^ m := P{l^ jP = m}, the probability that the size of the span- 
ning tree of p randomly chosen nodes in a binary search tree of size n 
is exactly m. Using the recursive structure of the search trees, we obtain 
for the generating functions (j) p (z,v) = J2 n ,m>o ( p L ) l Pn.p,mZ n v m , respectively, 
F p (z, v) = En,m>o {p) F n,p,m,z n v m for p > 1 the recurrences 

d p p_1 

(!) -^-<t> P {z,v) =v^24>i{z,v)4> p -i(z,v) + v^24> i (z,v)(t)p- 1 -i(z,v) 



dz 

i=0 



and 



g P- 1 P- 1 

, , —F p (z,v) = v^24>i(z,v)4> p _i(z,v) + v^24>i(z,v)ct) p -i-.i(z,v) 

y 1 ) oz i=i i=o 

+ 2F {z,v)F p (z,v), 

with the initial functions <f)o(z,v) = Fq(z,v) = j—. The difference in the 
above recurrences reflects the difference between both parameters coming 
from the instance where the root is not selected and also the left (i = 0), 
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respectively, right (i = p) subtree of the root does not contain a selected 
node. 

Introducing the trivariate generating functions $(0, u, v) = J2 P >o 4>p{. z ^ v ) uP 
and F(z,u,v) =^ p >oFp(z,v)vP , we obtain first from (1) a Riccati differen- 
tial equation 

(3) — $(z,u,v)=v(l + u)$ 2 (z,u,v) + - ^, 

oz (1 — Zj z 

with the initial value $>(0,u,v) = 1. The solution of this equation is already 
given in [9], 



(4) $(z,u,v) 



n + 1 - 2v + (1 - z) n (tt -l + 2v) 



(fi + 1 - 2v(l +«) + (!- z) n (n - 1 + 2w(l + «)))(! 



with 
(5) 



fi = -4(1 + u)v(l -u). 

For F(z,u,v) we get from (2) the differential equation 




9; 



-F(z,u,v) 



v(l + u)$ 2 (z,m,t;) 



v) + F(z,u,v) + 

1 — z 1 — Z (1 — Zf 



or 



9,, 2u _ . 2 „, . 2(v - 1) 

= —$(z,u,v) - - &(z,u,v) + - F(z,u,v) + -2, 

oz 1 — z 1 — z (1 — z) z 

with F(0,u,v) = 1. 

This equation then has the solution 



F(z,u,v) 



(6) 



1 + 2zQ - 1) 
1 



(i-^) 2 7o 

with $(2;, given by (4). 



9 , . , 2v x/ , 

— $ (t,u,w) - -$(t,u,v) 

at 1 — t 



(l-t) 2 dt, 



3. Expectation and variance of the spanning tree size. From (6) it is 
easy to obtain exact formulas for the expectation 



E n , p = E(Y n>p ) = i_ [ z n u P] JL F (z, u, v] 



v=l 
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and the second factorial moment 



Mg> = E(Y niP (Y niP - 1)) = -L[z n uP]^F(z,u,v) 



v=l 



(and thus also for the variance V n:P ) of Y ntP , the size of the spanning tree of 
p randomly selected nodes in a random binary search tree of size n. 

Differentiating (6) with respect to v and evaluating at v = 1 gives the 
following equation for E(z,u) := -^F(z,u,v)\ v= \: 



(7) 



E(z,u) 



2z 



+ 



(l-*) 2 0-z) 



4(1 -t)(l + u)u 2 i 1 
log 



(l-t(l + «)) 3 
+ 



dt, 



(l-t(l + n)) 3 . 

with X = (-2 + u) + (6 + 3u - 3u 2 - 4u 3 )t + (1 + u){2u 2 - 3u - 6)t 2 + (2 + 
u)(l + u) 2 t 3 . 

This can be simplified to 

2u . 1 



E(z,u) 



2u(l + u) , 1 
log- 



log- 



(8) 



(1 - z(l + u)) 2 to 1 - z (1 - 2) 2 (1 + u) 2 to 1 - z(l + u) 
-2z -3u + z 2 + uz - 2u 2 + 2u2; 2 + 3u 2 z + u 2 z 2 ) 



+ 



(l-z) 2 (l+n)(l-z(l + ^)) 2 

To extract coefficients we use here and in the sequel the general formula? 
(see, e.g., [3]) 



1 



(i- z y 

1 



-log?" 



1 



(1-z) 



m+1 



log 2 



l-z 



n + m 
n 

n + m 
n 



(H, 



n+m 



((H n + m — H m ) — (H n 



(2) 



where H n = Y^l=i \ anci H n 2 ^ = J2k=i Tp denote the first and second order 
harmonic numbers. 



By lengthy, but routine calculations, we finally get for E np ■ 
an exact formula, which is given in the next lemma: 



C) 



z n uP]E(z,u) 



Lemma 1. The expectation E np = M(Y np ) of the size of the spanning 
tree of p randomly chosen nodes in a random binary search tree of size n is 
for p>l given by 

2p(n + l) 2 2(2p-l)(n + l) 
" = = — : -{Hn - H p ) + — ; | — + 3 + 2p 



(n + 2-p)(n+l-p) 

2pn 2p(n+l)(-l) p 



n + 1 — p 



+ 







(n + 2 — p) (n + 1 — p) 
2p(n + l){-lf p ^ {-l) k 







E 

k=l 



fcj' 



G 
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and asymptotically for fixed p > 2 by 

2p ( log n \ 

E„ p = 2plogn + 2pj - 2pH„ + 3-2p + 0[ -2- . 

p — 1 \ n J 

For p = 1, the formula simplifies to E n \ = 1 as it should. 
We remark that 

and so one can give the alternative formula 

,p (n+2-p)(n+l-p) y p> {n + 2-p)(n + l-p) F 



2pn 2p(n + l)(-l) p A 



Q t P k Vfc 

When we differentiate equation (6) twice with respect to v and evaluate at 
v = l, we finally obtain the following formula for M 2 (z, u) := -^ I F{z 1 u,v)\ v= \: 

8u f z 1 1 
M 2 (z,u) = -- r=. / — - — -log- -dt 



(l-z) 2 7 l-i(l + it) 1-t 

4u(l + u) 2 (l-z + 2ii-u2:) i 2 1 
+ 71 71" — w5 lo § 



(l-z(l + n)) 3 b 1-z 



12u , 1 

(9) + 71 3571 I ^2 lQ g- 



(l-z) 2 (l + u) 2 + 

4u*i , 1 

+ 71 771 71— 77? lo § ■ 



+ 



(l-z)(l-z(l + u)) 3 to l 
2li 2 2^ 2 



(l-z) 2 (l-z(l+u)) 3 (l + u)' 
with the abbreviations 
$ x = - z 2 ti 2 - 3z 2 - 5z 2 ii + u 3 z 2 + 6z + 9zu + 2u 2 z - u 3 z - 3u 2 - 4u - 3, 
h 



qt 2 = - z 3 u 3 + 3u 3 z 2 - 2u 3 z - 19u 2 z + 22z V + 6u 2 - 3z 3 u 2 



+ Uu - 3z 3 u - 4Qzu + 3hz 2 u + 14 - 29z - z 3 + 16z 2 . 
Extracting coefficients gives after a somewhat lengthy calculation an exact 

(2) 1 

formula for the second factorial moment Mn,p = ■jjPj[z n u p ]M2(z,u) and we 
get via V n ,p = M^> + E n>p — E 2 p the following result: 
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Lemma 2. The variance V njP = Y(Y n ^ p ) of the size of the spanning tree 
of p chosen nodes in a random binary search tree of size n is for p > 2 
given by 

4(-l)P( n + l)(2p_ff n - 2pH p + 2 - 3p 2 ) ^ (-l) fc 
P( 



^ y 7 V"- ~r ± )\^F- l - l n — ^P^p -r ^ — o/y ; x - 



\p' fc=l 
8(-l)P(n + l) ^ r 1 (-l) fc 



77, 

k 2 \k 

W k=l V 



4(-l)P(2-3p 2 )(n + l) 4* 



p( p ) (n + 4-p)i 



2*, 



p2 



$ 3 = -2p 4 - 6nV + 16p 3 - 2nV - 45p 2 n - 58p 2 - 4n 2 p 2 + 2p 2 n 4 

+ 7n 3 p 2 + 56p + 78np + Gn 3 p + 41pn 2 - n 4 p - 8 - 20n - 16n 2 - 4re 3 , 
^ 4 = -144 - 6p 5 - 3n 4 - I52p 3 n + 2p 4 ra 2 + 25p 4 n - 234n + 78p + Wnp 

- 5n 4 p - 39n 2 p 3 - 4n 3 p 3 + 250p 2 n + 119n V + 2p 2 n 4 + 25n 3 p 2 

- 22n 3 p - 35pn 2 + 155p 2 -173p 3 + 58p 4 - 153n 2 - 42n 3 . 

Further we have V n \ = and the following asymptotic expansion for n — > oo 
and fixed p>2: 

V n , p = 2plogn- 2p(H p - 7 ) - 4p 2 - 

| 2(-2 + 7p-5p 2 + 2p 3 ) | c (\og 2 n 



(i — p) 2 V n 

Here we used the abbreviation x— := x(x — 1) • • • {x — m + 1) for the falling 
factorials. 

We remark again that an alternative representation of the variance would 
be possible using the additional formula 

1 n ( l^- 1 

t(ir2 , Z J(2)^ _ 

2 
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4. The limiting distribution of the number of passes in Multiple Quick- 
select. We will show that both random variables X ntP and Y ntP satisfy, 
for fixed p, a Gaussian limit law. To do this, we will expand the coeffi- 
cients at vP (for fixed p) of the trivariate generating functions <&(z,u,v), 
respectively, F(z,u,v) around their dominant singularity z = l, where the 
expansion holds uniformly for \v — 1| < r, for r > 0. Singularity analysis 
(see [2] ) of generating functions allows then to translate these expansions into 
an asymptotic expansion of the moment generating function (the Laplace 
transform) of the considered random variables. Then we can apply the so 
called Quasi power theorem (see [4]) to establish the weak convergence of 
the random variables to the normal distribution with certain convergence 
rates. 

In this section we will treat the random variable X n;P . As described above, 
we are interested in an asymptotic expansion of ■jT^[z n u p ]&(z,u,v) for n — > 

oo and fixed p uniformly for \v — 1| < r, where &(z, u, v) is given by the exact 
formula (4). 

To expand <£(z,u, v) we will use some auxiliary expansions of 

(10) /(u) = n + l-2u + (l- z) n (n-l + 2v), 

(11) g{u) = n + 1 - 2v(l + u) + (1 - z) n (n - 1 + 2v(l + u)) 

with f2 given by (5). All O-terms in the expansions given below are uniform 
for \v — 1| < r, as required. In the sequel, we will use the notations D u for the 
differential operator w.r.t. u and N u for the evaluation operator at u = 0. 
Since 



k\ k '\ '-mi 



k 



we get 

n _ Z )H _ e Qlog(l-z) 



(1-*) 



x exp 



2v-l 



(.-.^-^(V^-^L) 



k 

u k 



k>l 

and thus 

(12) N u Dl{\ - zf = - z) 2 "" 1 logP(l - *)). 
We have 

(13) /(0) = 5 (0) = (l-^" 1 2(2t;-l) 
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and we get further 

(14a) N u D u f(u) = - f^Zi) + °^ ~ z ? V ~ l lo ^ 1 ~ *))• 

2v 2 

(14b) N u D u g{u) = — + - z) 2v ~ l log(l - z)), 

Zv — 1 

/l/2\ / 4w(l-v) y 



iV u D£/(n) = (2v - l)p 



(14c) 



JV u D£ 9 (u) = (2v - l)p 



\ P J \ 1 - 4v (1 - u) 
+ 0((l-z) 2u - 1 logP(l-z)), 

1/2 W 4v(l-v) 



(14d) 



p / V 1 -4u(l -u) 

+ 0((l-z) 2t '- 1 log p (l-z)), 

for p>2. Furthermore, we want to expand (for p > 1) in terms 

of falling powers of (g(tt)) -1 , which gives 

= {-iyp\{g{u))-v-\g'{u)y 

+ { ~ 1)P '^ P - 1)pl (g(u))-v(g'(u)y- 2 g"(u) + 0{{g(u)r^) 

and hence we obtain the expansion 

N u Dl{g{u))- 1 

= ( 1)P , (-2v 2 /(2v - 1) + - z) 2 ^ 1 log(l - z))Y 
1 J P ' (i_ z )(p+i)(2«-i)2p+i(2u-1)p+i 

(15) 

+ 01 



(1- 2 )P(2«-1)_ 

(-l)V(-2v 2 /(2u-l)) p / log(l-z) 



(l_ )(p+i)(2«-i)2P+i(2u- 1)p+i V(l-z)P(^- 1 ) 
This finally gives 

/(«) 



1 -/(ojjv^gKu))- 1 + ^-p/'^JVui^-^Cu))- 1 



l-z v 7 uwv " 1-z 



1-JS V(1_ Z )(P-1)(2^1) 
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(16) 



(1 - z) 2v - 1 2{2v - \){-l)Pp\{-2v 1 /{2v - 1))? 

(1 _ 2)(p+l)(2«-l)+l2P+l(2 W - i)p+i 

p(-4u(l - v)/(2{2v - - l)\{-2v 2 /{2v - l))?" 1 



+ 



(1 - z)p( 2v - 1 ) +1 2p(2v - 1)p 



O 



log(l - z) 



(1 _ z )(p-l)(2t;-l)+l 



(1 _ z )p(2«-l)+l V (1 - z)(P- 1 )( 2i '- 1 )+l 

Singularity analysis leads then directly to 

[z n ]N u D^(z,u,v) 

~ P '\2v^l) r(p(2u - 1) + 1) V + \n 
(17) +0{{\ogn)n { P- 1 ^ 2v -^) 

p\(v/(2v - l))2 P -i n P(2,-D ( l + fl\\f l + ( logn 



r(p(2u-l) + l) V" ' "\nJJV ' ^ \n 2v ~ 1 
p\(v/{2v-l)) 2 P- 1 nP ( - 2v - 1 ^ 



T{p{2v - 1) + 1) 



1 + 



uniformly for e > and \v — 1| < r := | and also to the following expansion, 
which is valid for fixed p > 1 : 

=4[* n « p ]*(*,u J «)(i+0(- 

= l[^]iV UJ D^(^«^)fl + of- 



(18) 



p!(z;/(27;-l)) 2 P- 1 nP^- 2 ) 
T{p(2v- 1) + 1) 



1 + 



exp 



p(2t> — 2) log n + log 
1 



p!(t;/(2t;-l)) 2 P- 1 
r(p(2u - 1) + 1) 



x 1 + 



We give here the Quasi power theorem as proven in [4], which we want 
to apply to our problem. 
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Theorem 3 (H. K. Hwang). Let {^ n }n>i be a sequence of integral ran- 
dom variables. Suppose that the moment generating function satisfies the 
asymptotic expression 

M n (s) := E(e QnS ) = ^ P{ft n = m}e rns = e Hn{s) (l + 0{k~ 1 )), 

m>0 

the O-term being uniform for \s\ < r , s E C, r > 0, where: 

(i) H n (s) = u{s)(p{n) + v(s), with u(s) and v(s) analytic for \s\ < r and 
independent of n; u"(0) / 0, 

(ii) 4>(n) -> oo ; 

(iii) K n —> oo . 

Under these assumptions, the distribution of f2 n is asymptotically Gaus- 
sian 

m n -u'm(n) 1 = M _J_\ 

I ^"(O)0(n) J 1 )+ Un + v^RV 

uniformly with respect to x, x G M. i/ere denotes the distribution func- 

tion of the standard normal distribution Af(0, 1). 
Moreover, the mean and the variance of f2 n satisfy 

E(n n ) = «'(O)0(n) + u'(0) + O^ 1 ), 
V(n n ) = u"(O)0(n) + u"(0) + 

From (18) we get, with the notation in Theorem 3, 

/ p\{e s /{2e s - l)) 2 P~ l 



T{p{2e s - 1) + 1) 



u(s) =p(2e s -2), u(a) = logf 
<^>(n)=logn, K n = n 1_e . 

We find 

(19) u'(0)=2p, u"(0) = 2p, 
and 

t/(0) = -2p + 1 - 2p^(p + 1) = -2pi? p + 2^7 + 1 - 2p, 

(20) v"(0) = 2{2p - 1) - 2ptf (p + 1) - 4p 2 * , (p + 1) 

= 2{2p - 1) - 2pH p + 2 V1 - fvrV + 4p 2 ^, 

where \I/(x) denotes the digamma function: ty(x) := (logr(x))'. 

Hence, with equations (19) and (20), we get from Theorem 3 the following 
result: 
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Theorem 4. The distribution of the random variable X n , tP , which counts 
the number of passes in the Multiple Quickselect algorithm that are required 
to find a random order statistic of p elements in a data file of size n, is for 
fixed p>l asymptotically Gaussian, where the convergence rate is of order 

V v/logn' 



and the expectation E UtP = E(X njP ) and the variance V n ,p = V(X n . p ) satisfy 

1 



E n>p = 2plogn + 1 - 2p - 2pH p + 2p^ + 
K, P = 2p\ogn + 4p - 2 - 2pH p + 2 V1 + Ap 2 H^ - \ 2 p 2 + o(-}- 



The result for E np and V n ^ p already appeared in [9], but unfortunately 
there was a typo in the formula for V np . 

For the case p = 1 we have that X n \ counts the number of comparisons 
encountered by a successful search in a random binary search tree and this 
is, up to an additive constant, the same as the depth D n of a randomly 

selected node, D n = X n \ — 1. The asymptotic normality of the distribution 
of X n ^\ is well known (see, e.g., [6]) and the convergence rate was recently 
established in [7]. 

5. The limiting distribution of the spanning tree size in binary search 
trees. In this section we will show that the normalized random variable 
Y ntP , as defined in Section 4, has for fixed p a Gaussian limiting distribution. 
Hence we are interested in an asymptotic expansion of -^[z n u p ]F(z, u, v) 

for n — > oo and fixed p uniformly for \v — 1| < r, where F(z,u,v) is given 
by (6). 

To do this, we will first study the behavior near the singularity z = 1 of 
the expression 

/ d 2v \ 

(21) $(z,u,v)= {-^$(z,u,v)-^—^$(z,u,v)J{l-z) 2 , 

which we can write as 

/(«) 



(22) t>) 

(aw) 2 

where the function f(u) is defined by 

f{u) = -fi(J] - 1 + 2v)(l - z) n g{u) 

(23) 

+ n{Q - 1 + 2v(l + «))(! - z) n f(u) + (1 - 2v)f(u)g(u), 
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and $>(z,u,v), f(u) and g(u) are given by equations (4), (5), (10) and 
(11), respectively. 

The relevant expansions are now 

/(0) = -4(2«-l) 3 (l-z) 4 "- 2 , 

f'(0) = 8v 2 (2v - 1)(1 - zf v ~ l + 0(log(l - z)(l - z) 4v ~ 2 ), 



f"(0) 



8{v - l)v 3 



2v 



J — + 0(\og(l - z)(l - z 



,2^J-l^ 



and 



N u Dl{g{u)r 2 = (-l)f(p + 



(g(o)) p+2 \(g(0)) p+1 

which leads, for p > 2, eventually to 
N u DP$(z,u,v) 

= f(0)N u DP(g(u))- 2 +pf'(0)N u DP' 1 (g(u))- 2 



(24) 



+ ^-^f(0)N u DZ- 2 (g(u))- 2 + O 



(p-l)p\v(v/{2v -l)) 2p ~ 2 



(1 _ z )p{2v-l) 



+ o 



(1 - )(P-1)(2«-1) 
log(l - 0) 



(i_ z )(p-i)(2«-i) y 



This gives then 
1 



^[z n u p ]F(z,u,v) 



\z n u p 



(! " *) 5 



i=0 



$(t,u,v) dt 



(25) 



nP L J (l 



iV u I^$(t,u J «)cftfl + of- 



1 r m 1 



n pl ~ J (1 - z) 2 

(p-l)p\v(v/(2v - l)) 2p - 2 



t=0 



+ o 



We get via singularity analysis 



(1 _ z )p(2v-l) 

log(l - z) 



(1 _ )(p-l)(2v-l) 



dt[l + 



1 



(p- l)p!«(u/(2i> - l)) 2p - 2 



(1 - Zf Jt =0 (1 - Z)P P^ 1 ) 
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, m (p-l)plv(v/(2v-l)) 2 P- 2 



(26) 



and 



1 J (p(2« - 1) - 1)(1 - Z )P(2«-1)+1 

(p- l)p!u(u/(2u - l))2p-2 n P(2t-l) 

(p(2u - 1) - l)r(p(2u - 1) + 1) 



1 r (, Mi-g 



1 + 



(=)) 



(l-z) 2 iH \(l-t)MM 



it 



.n-fci 1 _ rJtn /" Z log(l-t) 



(27) = n max [z n ~% ^ max / ° v ' „ df 



uniformly for e > and |« — 1| < T := |. 

Thus we obtain by combining the results (25)-(27) for p > 2 the asymp- 
totic expansion 

±r[z n V?]F(z,U,v) 

(28) , x 

(p - l)p\v(v/(2v - l))2p-2 n p(2t,-2) 



1 + 



(p(2v - 1) - l)r(p(2u - 1) + 1) 
To apply the Quasi power theorem, we write (28) as 



-^[z n U P]F(z,u,v) 



(29) = exp 



p(2v — 2) log n + log 
1 



{p-l)plv(v/{2v -l)fP- 2 
(p(2v - 1) - l)T(p(2v - 1) + 1) 



x 1 + 



and then get, with the notation used in Theorem 3, 

r, (n s n\ , x , ( {p-l)p\e s {e s /{2e s -l)fP- 2 \ 

u{s) = P (2e° - 2), v(s) = log ( ^ _ - _ _ - - - } ) . 

(/>(ra) = log ?7,, K n =n 1 ~ e . 
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We have 

(30) u'(0) = 2p, u"(0)=2p, 



and 



2v 

v'(0) = -2p^(p + 1) + 3 - 2p 



-2pH p + 2pj + 3-2p- 



(31) 



p—1 
_2p_ 
p-1 



v"(o) = -2p*( P + 1) - A P H\p + 1) + 2{2p3 y+? p 2) 

(p - iy 

2 o 2 , 2.W21 2(2p 3 - 5p 2 + 7p - 2) 
= -2pH p + 2 P7 - -vr 2 p 2 + 4p 2 HW + V ^ ^ _ 1)2 ^ S 

which leads now to the following result: 

Theorem 5. The distribution of the random variable Y nsP , which counts 
the size of the spanning tree of p randomly chosen nodes in a binary search 
tree of size n, is for fixed p>2 asymptotically Gaussian, where the conver- 
gence rate is of order 0(- 



'logn 

'Y n „-2plogn } 1 



(Y n , p -2plo gn< A =Hx)+0 
{ v2plogn J 



and the expectation E np = M(Y np ) and the variance V np = Y(Y np ) satisfy 

2v ( 1 

E niP = 2p\ogn- 2pH p + 2p 7 + 3 - 2p + 1 



l-E 



p — 1 \n 

V n . p = 2plogn - 2pH p + 2p 7 - \ 2 p 2 + 4p 2 i^ 2 ) 

{ 2(2p 3 -5p 2 + 7p-2) | / 1 
(p — l) 2 \n l ~ 

Of course, the case p = 1 is trivial, since we have PjY^i = 1} = 1 due to the 
fact that the spanning tree of a single node is the node itself. 

The case p = 2 is of particular interest, since Y n _2 is as described earlier, 
up to an additive constant, the distance A n between two randomly selected 

nodes in a binary search tree of size n, viz. A n = Y n 2 — 1. This parameter was 
studied already in [7], where the asymptotic normality of the distribution 
was shown by means of a refined contraction method. 

As an insightful referee remarks, one could also obtain the Gaussian limit 
law for Y n)P (without the precision of the order of convergence obtained here) 
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by studying the difference between X n-p and Y n ^ p , which is the length of the 
path from the root of the binary search tree to the root of the minimal 
spanning tree. This quantity is very short, for example, it can be shown, 
that it is zero with probability 1 — 2/(p+ 1) asymptotically for n — > oo and 
p>2. Since we gave already a detailed analysis of Y n ,p in this section, we will 
only describe, very briefly, how one could proceed alternatively. It follows by 
comparing Theorem 4 and Lemma 1, that K(X ntP — Y np ) = 4 + 2pj (p — 1) + 
0{l/n l ~ £ ). One gets thus, that P{X n , jP - Y„ iP > (logn) 1 '/ 4 } = ©((logn)" 1 ^). 
This bound finally suffices to transfer the limiting distribution result from 
X ntP to Y n p by co nsidering F{(Y ntP - 2plog n)/ v / 2plogre < x} = F{(X niP - 
2p log n)/ v / 2p logn - (X ntP - Y ntP )/^2plogn < x}. 
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